Initial Exploration of the P/NBD Model
In this workbook we introduce the various different BTYD models, starting with a discussion of the underlying theory.
1 Background Theory
Before we start working on fitting and using the various Buy-Till-You-Die models, we first need to discuss the basic underlying theory and model.
In this model, we assume a customer becomes ‘alive’ to the business at the first purchase and then makes purchases stochastically but at a steady-rate for a period of time, and then ‘dies’ - i.e. becomes inactive to the business - hence the use of “Buy-Till-You-Die”.
Thus, at a high level these models decompose into modelling the transaction events using distributions such as the Poisson or Negative Binomial, and then modelling the ‘dropout’ process using some other method.
A number of BTYD models exist and for this workshop we will focus on the BG/NBD model - the Beta-Geometric Negative Binomial Distribution model (though we will discuss the P/NBD model also).
These models require only two pieces of information about each customer’s purchasing history: the “recency” (when the last transaction occurred) and “frequency” (the count of transactions made by that customer in a specified time period).
The notation used to represent this information is
\[ X = (x, \, t_x, \, T), \] where
\[ \begin{eqnarray*} x &=& \text{the number of transactions}, \\ T &=& \text{the observed time period}, \\ t_x &=& \text{the time since the last transaction}. \end{eqnarray*} \]
From this summary data we can fit most BTYD models.
2 BTYD Models
There are a number of different statistical approaches to building BTYD models - relying on a number of different assumptions about how the various recency, frequency and monetary values are modelled.
We now discuss a number of different ways of modelling this.
2.1 Pareto/Negative-Binomial Distribution (P/NBD) Model
The P/NBD model relies on five assumptions:
- While active, the number of transactions made by a customer follows a Poisson process with transaction rate \(\lambda\).
- Heterogeneity in \(\lambda\) follows a Gamma distribution \(\Gamma(\lambda \, | \, \alpha, r)\) with shape \(r\) and rate \(\alpha\).
- Each customer has an unobserved ‘lifetime’ of length \(\tau\). This point at which the customer becomes inactive is distributed as an exponential with dropout rate \(\mu\).
- Heterogeneity in dropout rates across customers follows a Gamma distribution \(\Gamma(\mu \, | \, s, \beta)\) with shape parameter \(s\) and rate parameter \(\beta\).
- The transaction rate \(\lambda\) and the dropout rate \(\mu\) vary independently across customers.
As before, we express this in mathematical notation as:
\[ \begin{eqnarray*} \lambda &\sim& \Gamma(\alpha, r), \\ \mu &\sim& \Gamma(s, \beta), \\ \tau &\sim& \text{Exponential}(\mu) \end{eqnarray*} \]
2.2 Beta-Geometric/Negative-Binomial Distribution (BG/NBD) Model
This model relies on a number of base assumptions, somewhat similar to the P/NBD model but modelling lifetime with a different process:
- While active, the number of transactions made by a customer follows a Poisson process with transaction rate \(\lambda\).
- Heterogeneity in \(\lambda\) follows a Gamma distribution \(\Gamma(\lambda \, | \, \alpha, r)\) with parameters shape \(r\) and rate \(\alpha\).
- After any transaction, a customer becomes inactive with probability \(p\).
- Heterogeneity in \(p\) follows a Beta distribution \(B(p \, | \, a, b)\) with shape parameters \(a\) and \(b\).
- The transaction rate \(\lambda\) and the dropout probability \(p\) vary independently across customers.
Note that it follows from the above assumptions that the probability of a customer being ‘alive’ after any transaction is given by the Geometric distribution, and hence the Beta-Geometric in the name.
To put this into more formal mathematical notation, we have:
\[ \begin{eqnarray*} \lambda &\sim& \Gamma(\alpha, r), \\ P(\text{alive}, k) &\sim& \text{Geometric}(p, k), \\ p &\sim& \text{Beta}(a, b) \end{eqnarray*} \]
3 Initial P/NBD Models
We start by modelling the P/NBD model using our synthetic datasets before we try to model real-life data.
3.1 Load Long Time-frame Synthetic Data
customer_cohortdata_tbl <- read_rds("data/synthdata_longframe_cohort_tbl.rds")
customer_cohortdata_tbl %>% glimpse()## Rows: 50,000
## Columns: 4
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0001", "C201101_0002", "C201101_0003", "C20110…
## $ cohort_qtr [3m[38;5;246m<chr>[39m[23m "2011 Q1", "2011 Q1", "2011 Q1", "2011 Q1", "2011 Q1", …
## $ cohort_ym [3m[38;5;246m<chr>[39m[23m "2011 01", "2011 01", "2011 01", "2011 01", "2011 01", …
## $ first_tnx_date [3m[38;5;246m<date>[39m[23m 2011-01-01, 2011-01-01, 2011-01-01, 2011-01-01, 2011-0…
customer_simparams_tbl <- read_rds("data/synthdata_longframe_simparams_tbl.rds")
customer_simparams_tbl %>% glimpse()## Rows: 50,000
## Columns: 9
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0001", "C201101_0002", "C201101_0003", "C2011…
## $ cohort_qtr [3m[38;5;246m<chr>[39m[23m "2011 Q1", "2011 Q1", "2011 Q1", "2011 Q1", "2011 Q1",…
## $ cohort_ym [3m[38;5;246m<chr>[39m[23m "2011 01", "2011 01", "2011 01", "2011 01", "2011 01",…
## $ first_tnx_date [3m[38;5;246m<date>[39m[23m 2011-01-01, 2011-01-01, 2011-01-01, 2011-01-01, 2011-…
## $ customer_mu [3m[38;5;246m<dbl>[39m[23m 0.04783829, 0.10238990, 0.03607158, 0.10886867, 0.0871…
## $ customer_tau [3m[38;5;246m<dbl>[39m[23m 16.59526247, 3.98578810, 1.36755838, 16.12324067, 1.61…
## $ customer_lambda [3m[38;5;246m<dbl>[39m[23m 0.37963949, 0.08399882, 0.27328828, 0.32830425, 0.0561…
## $ customer_nu [3m[38;5;246m<dbl>[39m[23m 1.704865659, 0.162742489, 0.874737616, 1.005165071, 0.…
## $ customer_p [3m[38;5;246m<dbl>[39m[23m 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,…
customer_transactions_tbl <- read_rds("data/synthdata_longframe_transactions_tbl.rds")
customer_transactions_tbl %>% glimpse()## Rows: 269,702
## Columns: 4
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0002", "C201101_0018", "C201101_0001", "C201101…
## $ tnx_timestamp [3m[38;5;246m<dttm>[39m[23m 2011-01-01 00:20:28, 2011-01-01 00:43:10, 2011-01-01 02…
## $ invoice_id [3m[38;5;246m<chr>[39m[23m "T20110101-0001", "T20110101-0002", "T20110101-0003", "T…
## $ tnx_amount [3m[38;5;246m<dbl>[39m[23m 547.93, 90.67, 57.90, 143.08, 113.37, 22.20, 574.21, 112…
We re-produce the visualisation of the transaction times we used in previous workbooks.
plot_tbl <- customer_transactions_tbl %>%
group_nest(customer_id, .key = "cust_data") %>%
filter(map_int(cust_data, nrow) > 3) %>%
slice_sample(n = 30) %>%
unnest(cust_data)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id)) +
geom_line() +
geom_point() +
labs(
x = "Date",
y = "Customer ID",
title = "Visualisation of Customer Transaction Times"
) +
theme(axis.text.y = element_text(size = 10))3.2 Derive the Log-likelihood Function
We now turn our attention to deriving the log-likelihood model for the P/NBD model.
We assume that we know that a given customer has made \(x\) transactions after the initial one over an observed period of time \(T\), and we label these transactions \(t_1\), \(t_2\), …, \(t_x\).
To model the likelihood for this observation, we need to consider two possibilities: one for where the customer is still ‘alive’ at \(T\), and one where the customer has ‘died’ by \(T\).
In the first instance, the likelihood is the product of the observations of each transaction, multiplied by the likelihood of the customer still being alive at time \(T\).
Because we are modelling the transaction counts as a Poisson process, this corresponds to the times between events following an exponential distribution, and so both the transaction times and the lifetime likelihoods use the exponential.
This gives us:
\[ \begin{eqnarray*} L(\lambda \, | \, t_1, t_2, ..., t_x, T, \, \tau > T) &=& \lambda e^{-\lambda t_1} \lambda e^{-\lambda(t_2 - t_1)} ... \lambda e^{-\lambda (t_x - t_{x-1})} e^{-\lambda(T - t)} \\ &=& \lambda^x e^{-\lambda T} \end{eqnarray*} \]
and we can combine this with the likelihood of the lifetime of the customer \(\tau\) being greater than the observation window \(T\),
\[ P(\tau > T \, | \, \mu) = e^{-\mu T} \]
For the second case, the customer becomes inactive at some time \(\tau\) in the interval \((t_x, T]\), and so the likelihood is
\[ \begin{eqnarray*} L(\lambda \, | \, t_1, t_2, ..., t_x, T, \, \tau > T) &=& \lambda e^{-\lambda t_1} \lambda e^{-\lambda(t_2 - t_1)} ... \lambda e^{-\lambda (t_x - t_{x-1})} e^{-\lambda(\tau - t_x)} \\ &=& \lambda^x e^{-\lambda \tau} \end{eqnarray*} \]
In both cases we do not need the times of the individual transactions, and all we need are the values \((x, t_x, T)\).
As we cannot observe \(\tau\), we want to remove the conditioning on \(\tau\) by integrating it out.
\[ \begin{eqnarray*} L(\lambda, \mu \, | \, x, t_x, T) &=& L(\lambda \, | \, t_1, t_2, ..., t_x, T, \, \tau > T) \, P(\tau > T \, | \, \mu) + \int^T_{t_x} L(\lambda \, | \, x, T, \text{ inactive at } (t_x, T] ) \, f(\tau \, | \mu) d\tau \\ &=& \lambda^x e^{-\lambda T} e^{\mu T} + \lambda^x \int^T_{t_x} e^{-\lambda \tau} \mu e^{-\mu \tau} d\tau \\ &=& \lambda^x e^{-(\lambda + \mu)T} + \frac{\lambda^x \mu}{\lambda + \mu} e^{-(\lambda + \mu) t_x} + \frac{\lambda^x \mu}{\lambda + \mu} e^{-(\lambda + \mu) T} \\ &=& \frac{\lambda^x \mu}{\lambda + \mu} e^{-(\lambda + \mu) t_x} + \frac{\lambda^{x+1} \mu}{\lambda + \mu} e^{-(\lambda + \mu) T} \end{eqnarray*} \]
In Stan, we do not calculate the likelihoods but the Log-likelihood, so we need
to take the log of this expression. This creates a problem, as we have no easy
way to calculate \(\log(a + b)\). As this expression occurs a lot, Stan provides
a log_sum_exp(), which is defined by
\[ \log \, (a + b) = \text{log_sum_exp}(\log a, \log b) \]
\[ \begin{eqnarray*} LL(\lambda, \mu \, | \, x, t_x, T) &=& \log \left( \frac{\lambda^x \, \mu}{\lambda + \mu} \left( e^{-(\lambda + \mu) t_x} + \lambda e^{-(\lambda + \mu) T} \right) \right) \\ &=& x \log \lambda + \log \mu -log(\lambda + \mu) + \text{log_sum_exp}(-(\lambda + \mu) \, t_x, \; \log \lambda - (\lambda + \mu) \, T) \end{eqnarray*} \]
This is the log-likelihood model we want to fit in Stan.
3.3 Construct Datasets
Having loaded the synthetic data we need to construct a number of datasets of derived values.
customer_summarystats_tbl <- customer_transactions_tbl %>%
calculate_transaction_cbs_data(last_date = as.Date("2018-12-31"))
customer_summarystats_tbl %>% glimpse()## Rows: 44,306
## Columns: 6
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0001", "C201101_0002", "C201101_0003", "C20110…
## $ first_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-01-01 02:19:57, 2011-01-01 00:20:28, 2011-01-01 1…
## $ last_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-03-15 15:59:06, 2011-01-20 08:47:25, 2011-01-01 1…
## $ x [3m[38;5;246m<dbl>[39m[23m 6, 1, 0, 4, 0, 3, 0, 2, 5, 3, 1, 1, 23, 0, 1, 2, 1, 22,…
## $ t_x [3m[38;5;246m<dbl>[39m[23m 10.5098368, 2.7645783, 0.0000000, 11.2458596, 0.0000000…
## $ T_cal [3m[38;5;246m<dbl>[39m[23m 417.2718, 417.2837, 417.2001, 417.2551, 417.1894, 417.1…
As before, we construct a number of subsets of the data for use later on with the modelling and create some data subsets.
shuffle_tbl <- customer_summarystats_tbl %>%
slice_sample(n = nrow(.), replace = FALSE)
id_50 <- shuffle_tbl %>% head(50) %>% pull(customer_id) %>% sort()
id_1000 <- shuffle_tbl %>% head(1000) %>% pull(customer_id) %>% sort()
id_5000 <- shuffle_tbl %>% head(5000) %>% pull(customer_id) %>% sort()
id_10000 <- shuffle_tbl %>% head(10000) %>% pull(customer_id) %>% sort()We then construct some fit data based on these values.
fit_1000_data_tbl <- customer_summarystats_tbl %>% filter(customer_id %in% id_1000)
fit_1000_data_tbl %>% glimpse()## Rows: 1,000
## Columns: 6
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0025", "C201101_0084", "C201101_0091", "C20110…
## $ first_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-01-02 18:08:43, 2011-01-07 08:04:07, 2011-01-07 2…
## $ last_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-02-28 14:14:05, 2012-07-22 20:11:23, 2011-01-07 2…
## $ x [3m[38;5;246m<dbl>[39m[23m 2, 29, 0, 9, 0, 0, 16, 2, 3, 0, 6, 0, 0, 2, 7, 100, 9, …
## $ t_x [3m[38;5;246m<dbl>[39m[23m 8.119581, 80.357865, 0.000000, 21.078214, 0.000000, 0.0…
## $ T_cal [3m[38;5;246m<dbl>[39m[23m 417.0348, 416.3805, 416.2896, 416.2141, 416.2384, 415.0…
fit_10000_data_tbl <- customer_summarystats_tbl %>% filter(customer_id %in% id_10000)
fit_10000_data_tbl %>% glimpse()## Rows: 10,000
## Columns: 6
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0002", "C201101_0019", "C201101_0020", "C20110…
## $ first_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-01-01 00:20:28, 2011-01-02 16:32:41, 2011-01-02 2…
## $ last_tnx_date [3m[38;5;246m<dttm>[39m[23m 2011-01-20 08:47:25, 2011-01-02 16:32:41, 2011-01-02 2…
## $ x [3m[38;5;246m<dbl>[39m[23m 1, 0, 0, 2, 2, 4, 2, 4, 13, 7, 1, 0, 0, 8, 5, 1, 0, 29,…
## $ t_x [3m[38;5;246m<dbl>[39m[23m 2.764578, 0.000000, 0.000000, 8.119581, 5.265854, 21.74…
## $ T_cal [3m[38;5;246m<dbl>[39m[23m 417.2837, 417.0444, 417.0236, 417.0348, 417.0938, 417.0…
Finally, we also want to recreate our transaction visualisation for the first 50 customers randomly selected.
plot_tbl <- customer_transactions_tbl %>%
filter(customer_id %in% id_50)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id)) +
geom_line() +
geom_point() +
labs(
x = "Date",
y = "Customer ID",
title = "Visualisation of Customer Transaction Times"
) +
theme(axis.text.y = element_text(size = 10))4 Fit Initial P/NBD Model
We now construct our Stan model and prepare to fit it with our synthetic dataset.
Before we start on that, we set a few parameters for the workbook to organise our Stan code.
stan_modeldir <- "stan_models"
stan_codedir <- "stan_code"We start with the Stan model.
## functions {
## #include util_functions.stan
## }
##
## data {
## int<lower=1> n; // number of customers
##
## vector<lower=0>[n] t_x; // time to most recent purchase
## vector<lower=0>[n] T_cal; // total observation time
## vector<lower=0>[n] x; // number of purchases observed
##
## real<lower=0> lambda_mn; // hyperprior mean for lambda
## real<lower=0> lambda_cv; // hyperprior cv for lambda
##
## real<lower=0> mu_mn; // hyperprior mean for mu
## real<lower=0> mu_cv; // hyperprior mean for mu
## }
##
## transformed data {
## real<lower=0> r = 1 / (lambda_cv * lambda_cv);
## real<lower=0> alpha = 1 / (lambda_cv * lambda_cv * lambda_mn);
##
## real<lower=0> s = 1 / (mu_cv * mu_cv);
## real<lower=0> beta = 1 / (mu_cv * mu_cv * mu_mn);
## }
##
##
## parameters {
## vector<lower=0>[n] lambda; // purchase rate
## vector<lower=0>[n] mu; // lifetime dropout rate
## }
##
##
## model {
## // setting priors
## lambda ~ gamma(r, alpha);
## mu ~ gamma(s, beta);
##
## target += calculate_pnbd_loglik(n, lambda, mu, x, t_x, T_cal);
## }
##
## generated quantities {
## vector[n] p_alive; // Probability that they are still "alive"
##
## p_alive = 1 ./ (1 + mu ./ (mu + lambda) .* (exp((lambda + mu) .* (T_cal - t_x)) - 1));
## }
This file contains a few new features of Stan - named file includes and
user-defined functions - calculate_pnbd_loglik. We look at this file here:
## real calculate_pnbd_loglik(int n, vector lambda, vector mu,
## data vector x, data vector t_x, data vector T_cal) {
## // likelihood
## vector[n] t1;
## vector[n] t2;
##
## vector[n] lpm;
## vector[n] lht;
## vector[n] rht;
##
## lpm = lambda + mu;
##
## lht = log(lambda) - lpm .* T_cal;
## rht = log(mu) - lpm .* t_x;
##
## t1 = x .* log(lambda) - log(lpm);
##
## for (i in 1:n) {
## t2[i] = log_sum_exp(lht[i], rht[i]);
## }
##
## return(sum(t1) + sum(t2));
## }
##
##
## real calculate_bgnbd_loglik(int n, vector lambda, vector p,
## data vector x, data vector t_x, data vector T_cal) {
## // likelihood
## vector[n] t1;
## vector[n] t2;
##
## vector[n] lht;
## vector[n] rht;
##
## lht = log(p) + (x-1) .* log(1-p) + x .* log(lambda) - lambda .* t_x;
## rht = x .* log(1-p) + x .* log(lambda) - lambda .* T_cal;
##
## for(i in 1:n) {
## t2[i] = log_sum_exp(lht[i], rht[i]);
## }
##
## return(sum(t2));
## }
We now compile this model using CmdStanR.
pnbd_fixed_stanmodel <- cmdstan_model(
"stan_code/pnbd_fixed.stan",
include_paths = stan_codedir,
pedantic = TRUE,
dir = stan_modeldir
)We then use this compiled model with our data to produce a fit of the data.
stan_modelname <- "pnbd_fixed"
stanfit_prefix <- str_c("fit_", stan_modelname)
stan_data_lst <- fit_1000_data_tbl %>%
select(customer_id, x, t_x, T_cal) %>%
compose_data(
lambda_mn = 0.25,
lambda_cv = 1.00,
mu_mn = 0.10,
mu_cv = 1.00,
)
pnbd_fixed_stanfit <- pnbd_fixed_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = 4201,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1 finished in 27.1 seconds.
## Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4 finished in 27.5 seconds.
## Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 finished in 27.8 seconds.
## Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 2 finished in 28.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 27.7 seconds.
## Total execution time: 28.3 seconds.
pnbd_fixed_stanfit$summary()## # A tibble: 3,001 × 10
## variable mean median sd mad q5 q95 rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 lp__ -1.65e+4 -1.65e+4 34.1 34.6 -1.66e+4 -1.65e+4 1.00 635.
## 2 lambda[1] 1.96e-1 1.66e-1 0.127 0.109 4.55e-2 4.39e-1 1.00 3982.
## 3 lambda[2] 3.45e-1 3.41e-1 0.0618 0.0592 2.52e-1 4.52e-1 1.01 4310.
## 4 lambda[3] 1.40e-1 8.31e-2 0.164 0.0925 5.61e-3 4.95e-1 1.00 2831.
## 5 lambda[4] 3.66e-1 3.51e-1 0.117 0.118 1.97e-1 5.83e-1 0.999 4289.
## 6 lambda[5] 1.49e-1 8.34e-2 0.181 0.0983 3.82e-3 5.28e-1 1.00 2645.
## 7 lambda[6] 1.35e-1 8.15e-2 0.167 0.0928 5.16e-3 4.55e-1 0.999 2872.
## 8 lambda[7] 1.40e-1 1.37e-1 0.0352 0.0334 8.71e-2 2.00e-1 1.01 5067.
## 9 lambda[8] 1.80e-1 1.51e-1 0.121 0.103 3.75e-2 4.11e-1 1.00 3065.
## 10 lambda[9] 2.75e-1 2.45e-1 0.149 0.136 8.42e-2 5.53e-1 1.00 4007.
## # … with 2,991 more rows, and 1 more variable: ess_tail <dbl>
We have some basic HMC-based validity statistics we can check.
pnbd_fixed_stanfit$cmdstan_diagnose()## Processing csv files: /home/rstudio/workshop/stan_models/fit_pnbd_fixed-1.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed-2.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed-3.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed-4.csvWarning: non-fatal error reading adaptation data
##
##
## Checking sampler transitions treedepth.
## Treedepth satisfactory for all transitions.
##
## Checking sampler transitions for divergences.
## No divergent transitions found.
##
## Checking E-BFMI - sampler transitions HMC potential energy.
## E-BFMI satisfactory.
##
## Effective sample size satisfactory.
##
## Split R-hat values satisfactory all parameters.
##
## Processing complete, no problems detected.
4.1 Visual Diagnostics of the Sample Validity
Now that we have a sample from the posterior distribution we need to create a few different visualisations of the diagnostics.
parameter_subset <- c(
"lambda[1]", "lambda[2]", "lambda[3]", "lambda[4]",
"mu[1]", "mu[2]", "mu[3]", "mu[4]"
)
pnbd_fixed_stanfit$draws(inc_warmup = TRUE) %>%
mcmc_trace(
pars = parameter_subset,
n_warmup = 500
) +
ggtitle("Full Traceplots of Some Lambda and Mu Values")As the warmup is skewing the y-axis somewhat, we repeat this process without the warmup.
pnbd_fixed_stanfit$draws(inc_warmup = FALSE) %>%
mcmc_trace(pars = parameter_subset) +
expand_limits(y = 0) +
labs(
x = "Iteration",
y = "Value",
title = "Traceplot of Sample of Lambda and Mu Values"
) +
theme(axis.text.x = element_text(size = 10))A common MCMC diagnostic is \(\hat{R}\) - which is a measure of the ‘similarity’ of the chains.
pnbd_fixed_stanfit %>%
rhat(pars = c("lambda", "mu")) %>%
mcmc_rhat() +
ggtitle("Plot of Parameter R-hat Values")Related to this quantity is the concept of effective sample size, \(N_{eff}\), an estimate of the size of the sample from a statistical information point of view.
pnbd_fixed_stanfit %>%
neff_ratio(pars = c("lambda", "mu")) %>%
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes")Finally, we also want to look at autocorrelation in the chains for each parameter.
pnbd_fixed_stanfit$draws() %>%
mcmc_acf(pars = parameter_subset) +
ggtitle("Autocorrelation Plot of Sample Values")As before, this first fit has a comprehensive run of fit diagnostics, but for the sake of brevity in later models we will show only the traceplots once we are satisfied with the validity of the sample.
4.2 Check Model Fit
As we are still working with synthetic data, we know the true values for each customer and so we can check how good our model is at recovering the true values on a customer-by-customer basis.
As in previous workbooks, we build our validation datasets and then check the distribution of \(q\)-values for both \(\lambda\) and \(\mu\) across the customer base.
pnbd_fixed_valid_lst <- create_pnbd_posterior_validation_data(
stanfit = pnbd_fixed_stanfit,
data_tbl = fit_1000_data_tbl,
simparams_tbl = customer_simparams_tbl
)
pnbd_fixed_valid_lst$lambda_qval_plot %>% plot()pnbd_fixed_valid_lst$mu_qval_plot %>% plot()5 Fit Alternate Prior P/NBD Model
We now repeat all of the above but with an alternative set of priors and compare the outputs to give us an idea of the sensitivity of the inference to the input priors.
We will repeat this a few times, so we start by increasing the co-efficient of variation in the priors. We keep everything else constant, including the seed used by Stan.
stan_modelname <- "pnbd_fixed2"
stanfit_prefix <- str_c("fit_", stan_modelname)
stan_data_lst <- fit_1000_data_tbl %>%
select(customer_id, x, t_x, T_cal) %>%
compose_data(
lambda_mn = 0.25,
lambda_cv = 2.00,
mu_mn = 0.10,
mu_cv = 2.00,
)
pnbd_fixed2_stanfit <- pnbd_fixed_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = 4201,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 finished in 61.1 seconds.
## Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 2 finished in 62.8 seconds.
## Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4 finished in 63.9 seconds.
## Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1 finished in 65.9 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 63.4 seconds.
## Total execution time: 66.0 seconds.
pnbd_fixed_stanfit$summary()## # A tibble: 3,001 × 10
## variable mean median sd mad q5 q95 rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 lp__ -1.65e+4 -1.65e+4 34.1 34.6 -1.66e+4 -1.65e+4 1.00 635.
## 2 lambda[1] 1.96e-1 1.66e-1 0.127 0.109 4.55e-2 4.39e-1 1.00 3982.
## 3 lambda[2] 3.45e-1 3.41e-1 0.0618 0.0592 2.52e-1 4.52e-1 1.01 4310.
## 4 lambda[3] 1.40e-1 8.31e-2 0.164 0.0925 5.61e-3 4.95e-1 1.00 2831.
## 5 lambda[4] 3.66e-1 3.51e-1 0.117 0.118 1.97e-1 5.83e-1 0.999 4289.
## 6 lambda[5] 1.49e-1 8.34e-2 0.181 0.0983 3.82e-3 5.28e-1 1.00 2645.
## 7 lambda[6] 1.35e-1 8.15e-2 0.167 0.0928 5.16e-3 4.55e-1 0.999 2872.
## 8 lambda[7] 1.40e-1 1.37e-1 0.0352 0.0334 8.71e-2 2.00e-1 1.01 5067.
## 9 lambda[8] 1.80e-1 1.51e-1 0.121 0.103 3.75e-2 4.11e-1 1.00 3065.
## 10 lambda[9] 2.75e-1 2.45e-1 0.149 0.136 8.42e-2 5.53e-1 1.00 4007.
## # … with 2,991 more rows, and 1 more variable: ess_tail <dbl>
We have some basic HMC-based validity statistics we can check.
pnbd_fixed2_stanfit$cmdstan_diagnose()## Processing csv files: /home/rstudio/workshop/stan_models/fit_pnbd_fixed2-1.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed2-2.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed2-3.csvWarning: non-fatal error reading adaptation data
## , /home/rstudio/workshop/stan_models/fit_pnbd_fixed2-4.csvWarning: non-fatal error reading adaptation data
##
##
## Checking sampler transitions treedepth.
## Treedepth satisfactory for all transitions.
##
## Checking sampler transitions for divergences.
## No divergent transitions found.
##
## Checking E-BFMI - sampler transitions HMC potential energy.
## E-BFMI satisfactory.
##
## Effective sample size satisfactory.
##
## The following parameters had split R-hat greater than 1.05:
## p_alive[433], p_alive[474], p_alive[476], p_alive[920]
## Such high values indicate incomplete mixing and biased estimation.
## You should consider regularizating your model with additional prior information or a more effective parameterization.
##
## Processing complete.
5.1 Visual Diagnostics of the Sample Validity
We do not repeat the full set of validation checks here, but look at the plot of effective stepsizes.
pnbd_fixed2_stanfit %>%
neff_ratio(pars = c("lambda", "mu")) %>%
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes for Alternate Priors")5.2 Check Model Fit
As we are still working with synthetic data, we know the true values for each customer and so we can check how good our model is at recovering the true values on a customer-by-customer basis.
As in previous workbooks, we build our validation datasets and then check the distribution of \(q\)-values for both \(\lambda\) and \(\mu\) across the customer base.
pnbd_fixed2_valid_lst <- create_pnbd_posterior_validation_data(
stanfit = pnbd_fixed2_stanfit,
data_tbl = fit_1000_data_tbl,
simparams_tbl = customer_simparams_tbl
)
pnbd_fixed2_valid_lst$lambda_qval_plot %>% plot()pnbd_fixed2_valid_lst$mu_qval_plot %>% plot()We now construct some error-bar plots for a subset of the customers to get an idea of the differences across the two posteriors.
comparison_plot_tbl <- list(
fixed = pnbd_fixed_valid_lst$validation_tbl,
fixed2 = pnbd_fixed2_valid_lst$validation_tbl
) %>%
bind_rows(.id = "post_label") %>%
group_by(post_label, customer_id) %>%
summarise(
.groups = "drop",
p10 = quantile(post_lambda, 0.10),
p25 = quantile(post_lambda, 0.25),
p50 = quantile(post_lambda, 0.50),
p75 = quantile(post_lambda, 0.75),
p90 = quantile(post_lambda, 0.90),
customer_lambda = customer_lambda %>% unique()
) %>%
group_nest(customer_id) %>%
slice_sample(n = 50) %>%
unnest(data)
ggplot(comparison_plot_tbl) +
geom_point(aes(x = customer_id, y = customer_lambda)) +
geom_errorbar(
aes(x = customer_id, ymin = p25, ymax = p75, colour = post_label),
position = position_dodge(width = 0.75), width = 0, size = 3,
) +
geom_errorbar(
aes(x = customer_id, ymin = p10, ymax = p90, colour = post_label),
position = position_dodge(width = 0.75), width = 0, size = 1,
) +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5)
)6 Assessing the P/NBD Models
We now focus on assess the various versions of our models without the benefit of knowing the ‘true’ answer as we will not have this information in our real-world applications.
A key derived quantity to employ is \(P_{alive}(T)\), the probability that the customer is still ‘alive’ at the point of observation of the data.
6.1 Calculating P_alive
The probability that a customer with purchase history \((x, t_x, T)\) is ‘alive’ at time \(T\) is the probability that the (unobserved) time at which he becomes inactive (\(\tau\)) occurs after T, that is \(P(\tau > T)\).
We apply Bayes’ Theorem to give us:
\[ \begin{eqnarray*} P(\tau > T \, | \, \lambda, \mu, x, t_x, T) &=& \frac{L(\lambda \, | \, x, T, \tau > T) \, P(\tau > T \, | \, \mu)} {L(\lambda, \mu \, | \, x, t_x, T)} \\ &=& \frac{\lambda^x e^{−(λ+μ)T}} {L(\lambda, \mu \, | \, x, t_x, T)} \end{eqnarray*} \]
We know the likelihood for this model, so can now substitute this in:
\[ \begin{eqnarray*} P(\tau > T \, | \, \lambda, \mu, x, t_x, T) &=& \frac{\lambda^x e^{−(λ+μ)T}} {\frac{\lambda^x \mu}{\lambda + \mu} e^{-(\lambda + \mu) t_x} + \frac{\lambda^{x+1} \mu}{\lambda + \mu} e^{-(\lambda + \mu) T}} \\ &=& \frac{\lambda^x \, e^{-(\lambda + \mu)T}} {\lambda^x e^{-(\lambda + \mu)T} \{1 + (\frac{u}{\lambda + \mu}) [e^{-(\lambda + \mu)(t_x - T)} - 1 ]\}} \\ &=& \frac{1}{1 + (\frac{u}{\lambda + \mu}) [e^{(\lambda + \mu)(T - t_x)} - 1 ]} \end{eqnarray*} \]
We now want to verify this calculation in our posterior estimates. To do this, we take a number of different cohorts of customers, and visually inspect our transaction visualisation.
pnbd_fixed_palive_summary_tbl <- pnbd_fixed_valid_lst$validation_tbl %>%
group_by(customer_id) %>%
summarise(
.groups = "drop",
p_alive_p10 = quantile(p_alive, 0.10),
p_alive_p25 = quantile(p_alive, 0.25),
p_alive_p50 = quantile(p_alive, 0.50),
p_alive_p75 = quantile(p_alive, 0.75),
p_alive_p90 = quantile(p_alive, 0.90),
p_alive_range50 = p_alive_p75 - p_alive_p25,
p_alive_range80 = p_alive_p90 - p_alive_p10,
)
pnbd_fixed_palive_summary_tbl %>% glimpse()## Rows: 1,000
## Columns: 8
## $ customer_id [3m[38;5;246m<chr>[39m[23m "C201101_0025", "C201101_0084", "C201101_0091", "C2011…
## $ p_alive_p10 [3m[38;5;246m<dbl>[39m[23m 1.063788e-85, 5.803798e-65, 1.310305e-102, 9.548350e-1…
## $ p_alive_p25 [3m[38;5;246m<dbl>[39m[23m 2.433787e-64, 9.276485e-59, 8.515800e-67, 1.568615e-87…
## $ p_alive_p50 [3m[38;5;246m<dbl>[39m[23m 8.500835e-47, 1.778625e-52, 7.813795e-42, 1.564965e-70…
## $ p_alive_p75 [3m[38;5;246m<dbl>[39m[23m 1.652795e-31, 9.765512e-47, 6.032905e-24, 1.333278e-56…
## $ p_alive_p90 [3m[38;5;246m<dbl>[39m[23m 8.281591e-22, 2.221119e-41, 7.266608e-14, 9.218306e-47…
## $ p_alive_range50 [3m[38;5;246m<dbl>[39m[23m 1.652795e-31, 9.765512e-47, 6.032905e-24, 1.333278e-56…
## $ p_alive_range80 [3m[38;5;246m<dbl>[39m[23m 8.281591e-22, 2.221119e-41, 7.266608e-14, 9.218306e-47…
We now take the customers that are highly likely to no longer be active, and highly likely to be active, and so we use the 10% and 90% percentiles.
likely_active_id <- pnbd_fixed_palive_summary_tbl %>%
filter(p_alive_p10 > 0.95) %>%
pull(customer_id)
plot_tbl <- customer_transactions_tbl %>%
filter(
tnx_timestamp < as.Date("2019-01-01"),
customer_id %in% likely_active_id
)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id)) +
geom_line() +
geom_point() +
geom_vline(aes(xintercept = as.POSIXct("2019-01-01")), colour = "red") +
labs(
x = "Date",
y = "Customer ID",
title = "Visualisation of Transaction Times for Likely Active Customers"
) +
theme(axis.text.y = element_text(size = 10))This is useful, but the longer period of time prevents us from seeing the more recent transactions clearly, so we focus on the final year.
likely_active_id <- pnbd_fixed_palive_summary_tbl %>%
filter(p_alive_p10 > 0.95) %>%
pull(customer_id)
plot_tbl <- customer_transactions_tbl %>%
filter(
tnx_timestamp < as.Date("2019-01-01"),
tnx_timestamp >= as.Date("2018-01-01"),
customer_id %in% likely_active_id
)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id)) +
geom_line() +
geom_point() +
geom_vline(aes(xintercept = as.POSIXct("2019-01-01")), colour = "red") +
labs(
x = "Date",
y = "Customer ID",
title = "Visualisation of Transaction Times for Likely Active Customers"
) +
theme(axis.text.y = element_text(size = 10))We see that most of the active customers have reasonably recent transactions.
likely_inactive_id <- pnbd_fixed_palive_summary_tbl %>%
filter(p_alive_p90 < 0.05) %>%
pull(customer_id)
plot_tbl <- customer_transactions_tbl %>%
filter(
tnx_timestamp < as.Date("2019-01-01"),
customer_id %in% likely_inactive_id
)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id)) +
geom_line(alpha = 0.1) +
geom_point(alpha = 0.1, size = 1) +
geom_vline(aes(xintercept = as.POSIXct("2019-01-01")), colour = "red") +
labs(
x = "Date",
y = "Customer",
title = "Visualisation of Transaction Times for Likely Inactive Customers"
) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)Finally, we want to look at the customers with the most uncertainty in their
value for p_alive. There are a number of ways to do this, but we start by
looking at the range in values.
First we look at the distribution of these ranges.
ggplot(pnbd_fixed_palive_summary_tbl) +
geom_histogram(aes(x = p_alive_range80), binwidth = 0.02) +
labs(
x = "80% Interval Range",
y = "Frequency",
title = "Distribution of Range of Values for the 80% Credibility Range for p_alive"
)We see that most customers have their credibility intervals in a reasonably tight range, and so we can just look at customers where this range is at least 0.3.
plot_tbl <- pnbd_fixed_palive_summary_tbl %>%
filter(p_alive_range80 > 0.3) %>%
select(customer_id, p_alive_range80) %>%
inner_join(customer_transactions_tbl, by = "customer_id") %>%
filter(
tnx_timestamp < as.Date("2019-01-01"),
tnx_timestamp >= as.Date("2018-01-01")
)
ggplot(plot_tbl, aes(x = tnx_timestamp, y = customer_id, colour = p_alive_range80)) +
geom_line(alpha = 0.5) +
geom_point(alpha = 0.5, size = 1) +
geom_vline(aes(xintercept = as.POSIXct("2019-01-01")), colour = "red") +
scale_colour_gradient(low = "blue", high = "red") +
labs(
x = "Date",
y = "Customer",
colour = "Interval Range",
title = "Visualisation of Transaction Times for Non-Confident p_alive Customers"
) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)7 R Environment
options(width = 120L)
sessioninfo::session_info()## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.0 (2022-04-22)
## os Ubuntu 20.04.5 LTS
## system x86_64, linux-gnu
## ui RStudio
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2022-09-13
## rstudio 2022.02.3+492 Prairie Trillium (server)
## pandoc 2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] RSPM (R 4.2.0)
## arrayhelpers 1.1-0 2020-02-04 [1] RSPM (R 4.2.0)
## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.2.0)
## backports 1.4.1 2021-12-13 [1] RSPM (R 4.2.0)
## base64enc 0.1-3 2015-07-28 [1] RSPM (R 4.2.0)
## bayesplot * 1.9.0 2022-03-10 [1] RSPM (R 4.2.0)
## bit 4.0.4 2020-08-04 [1] RSPM (R 4.2.0)
## bit64 4.0.5 2020-08-30 [1] RSPM (R 4.2.0)
## bookdown 0.27 2022-06-14 [1] RSPM (R 4.2.0)
## boot 1.3-28 2021-05-03 [2] CRAN (R 4.2.0)
## bridgesampling 1.1-2 2021-04-16 [1] RSPM (R 4.2.0)
## brms * 2.17.0 2022-09-08 [1] Github (paul-buerkner/brms@a43937c)
## Brobdingnag 1.2-7 2022-02-03 [1] RSPM (R 4.2.0)
## broom 0.8.0 2022-04-13 [1] RSPM (R 4.2.0)
## bslib 0.3.1 2021-10-06 [1] RSPM (R 4.2.0)
## cachem 1.0.6 2021-08-19 [1] RSPM (R 4.2.0)
## callr 3.7.0 2021-04-20 [1] RSPM (R 4.2.0)
## cellranger 1.1.0 2016-07-27 [1] RSPM (R 4.2.0)
## checkmate 2.1.0 2022-04-21 [1] RSPM (R 4.2.0)
## cli 3.3.0 2022-04-25 [1] RSPM (R 4.2.0)
## cmdstanr * 0.5.3 2022-09-08 [1] Github (stan-dev/cmdstanr@22b391e)
## coda 0.19-4 2020-09-30 [1] RSPM (R 4.2.0)
## codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0)
## colorspace 2.0-3 2022-02-21 [1] RSPM (R 4.2.0)
## colourpicker 1.1.1 2021-10-04 [1] RSPM (R 4.2.0)
## conflicted * 1.1.0 2021-11-26 [1] RSPM (R 4.2.0)
## cowplot * 1.1.1 2020-12-30 [1] RSPM (R 4.2.0)
## crayon 1.5.1 2022-03-26 [1] RSPM (R 4.2.0)
## crosstalk 1.2.0 2021-11-04 [1] RSPM (R 4.2.0)
## curl 4.3.2 2021-06-23 [1] RSPM (R 4.2.0)
## data.table 1.14.2 2021-09-27 [1] RSPM (R 4.2.0)
## DBI 1.1.3 2022-06-18 [1] RSPM (R 4.2.0)
## dbplyr 2.2.0 2022-06-05 [1] RSPM (R 4.2.0)
## digest 0.6.29 2021-12-01 [1] RSPM (R 4.2.0)
## directlabels * 2021.1.13 2021-01-16 [1] RSPM (R 4.2.0)
## distributional 0.3.0 2022-01-05 [1] RSPM (R 4.2.0)
## dplyr * 1.0.9 2022-04-28 [1] RSPM (R 4.2.0)
## DT 0.23 2022-05-10 [1] RSPM (R 4.2.0)
## dygraphs 1.1.1.6 2018-07-11 [1] RSPM (R 4.2.0)
## ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.2.0)
## evaluate 0.15 2022-02-18 [1] RSPM (R 4.2.0)
## fansi 1.0.3 2022-03-24 [1] RSPM (R 4.2.0)
## farver 2.1.0 2021-02-28 [1] RSPM (R 4.2.0)
## fastmap 1.1.0 2021-01-25 [1] RSPM (R 4.2.0)
## forcats * 0.5.1 2021-01-27 [1] RSPM (R 4.2.0)
## fs * 1.5.2 2021-12-08 [1] RSPM (R 4.2.0)
## furrr * 0.3.0 2022-05-04 [1] RSPM (R 4.2.0)
## future * 1.26.1 2022-05-27 [1] RSPM (R 4.2.0)
## gamm4 0.2-6 2020-04-03 [1] RSPM (R 4.2.0)
## generics 0.1.2 2022-01-31 [1] RSPM (R 4.2.0)
## ggdist 3.1.1 2022-02-27 [1] RSPM (R 4.2.0)
## ggplot2 * 3.3.6 2022-05-03 [1] RSPM (R 4.2.0)
## ggridges 0.5.3 2021-01-08 [1] RSPM (R 4.2.0)
## globals 0.15.0 2022-05-09 [1] RSPM (R 4.2.0)
## glue * 1.6.2 2022-02-24 [1] RSPM (R 4.2.0)
## gridExtra 2.3 2017-09-09 [1] RSPM (R 4.2.0)
## gtable 0.3.0 2019-03-25 [1] RSPM (R 4.2.0)
## gtools 3.9.2.2 2022-06-13 [1] RSPM (R 4.2.0)
## haven 2.5.0 2022-04-15 [1] RSPM (R 4.2.0)
## highr 0.9 2021-04-16 [1] RSPM (R 4.2.0)
## hms 1.1.1 2021-09-26 [1] RSPM (R 4.2.0)
## htmltools 0.5.2 2021-08-25 [1] RSPM (R 4.2.0)
## htmlwidgets 1.5.4 2021-09-08 [1] RSPM (R 4.2.0)
## httpuv 1.6.5 2022-01-05 [1] RSPM (R 4.2.0)
## httr 1.4.3 2022-05-04 [1] RSPM (R 4.2.0)
## igraph 1.3.2 2022-06-13 [1] RSPM (R 4.2.0)
## inline 0.3.19 2021-05-31 [1] RSPM (R 4.2.0)
## jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.2.0)
## jsonlite 1.8.0 2022-02-22 [1] RSPM (R 4.2.0)
## knitr 1.39 2022-04-26 [1] RSPM (R 4.2.0)
## labeling 0.4.2 2020-10-20 [1] RSPM (R 4.2.0)
## later 1.3.0 2021-08-18 [1] RSPM (R 4.2.0)
## lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.0)
## lifecycle 1.0.1 2021-09-24 [1] RSPM (R 4.2.0)
## listenv 0.8.0 2019-12-05 [1] RSPM (R 4.2.0)
## lme4 1.1-29 2022-04-07 [1] RSPM (R 4.2.0)
## loo 2.5.1 2022-03-24 [1] RSPM (R 4.2.0)
## lubridate 1.8.0 2021-10-07 [1] RSPM (R 4.2.0)
## magrittr * 2.0.3 2022-03-30 [1] RSPM (R 4.2.0)
## markdown 1.1 2019-08-07 [1] RSPM (R 4.2.0)
## MASS 7.3-56 2022-03-23 [2] CRAN (R 4.2.0)
## Matrix 1.4-1 2022-03-23 [2] CRAN (R 4.2.0)
## matrixStats 0.62.0 2022-04-19 [1] RSPM (R 4.2.0)
## memoise 2.0.1 2021-11-26 [1] RSPM (R 4.2.0)
## mgcv 1.8-40 2022-03-29 [2] CRAN (R 4.2.0)
## mime 0.12 2021-09-28 [1] RSPM (R 4.2.0)
## miniUI 0.1.1.1 2018-05-18 [1] RSPM (R 4.2.0)
## minqa 1.2.4 2014-10-09 [1] RSPM (R 4.2.0)
## modelr 0.1.8 2020-05-19 [1] RSPM (R 4.2.0)
## munsell 0.5.0 2018-06-12 [1] RSPM (R 4.2.0)
## mvtnorm 1.1-3 2021-10-08 [1] RSPM (R 4.2.0)
## nlme 3.1-157 2022-03-25 [2] CRAN (R 4.2.0)
## nloptr 2.0.3 2022-05-26 [1] RSPM (R 4.2.0)
## parallelly 1.32.0 2022-06-07 [1] RSPM (R 4.2.0)
## pillar 1.7.0 2022-02-01 [1] RSPM (R 4.2.0)
## pkgbuild 1.3.1 2021-12-20 [1] RSPM (R 4.2.0)
## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.2.0)
## plyr 1.8.7 2022-03-24 [1] RSPM (R 4.2.0)
## posterior * 1.2.2 2022-06-09 [1] RSPM (R 4.2.0)
## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.2.0)
## processx 3.6.1 2022-06-17 [1] RSPM (R 4.2.0)
## projpred 2.1.2 2022-05-13 [1] RSPM (R 4.2.0)
## promises 1.2.0.1 2021-02-11 [1] RSPM (R 4.2.0)
## ps 1.7.1 2022-06-18 [1] RSPM (R 4.2.0)
## purrr * 0.3.4 2020-04-17 [1] RSPM (R 4.2.0)
## quadprog 1.5-8 2019-11-20 [1] RSPM (R 4.2.0)
## R6 2.5.1 2021-08-19 [1] RSPM (R 4.2.0)
## Rcpp * 1.0.8.3 2022-03-17 [1] RSPM (R 4.2.0)
## RcppParallel 5.1.5 2022-01-05 [1] RSPM (R 4.2.0)
## readr * 2.1.2 2022-01-30 [1] RSPM (R 4.2.0)
## readxl 1.4.0 2022-03-28 [1] RSPM (R 4.2.0)
## reprex 2.0.1 2021-08-05 [1] RSPM (R 4.2.0)
## reshape2 1.4.4 2020-04-09 [1] RSPM (R 4.2.0)
## rlang * 1.0.2 2022-03-04 [1] RSPM (R 4.2.0)
## rmarkdown 2.14 2022-04-25 [1] RSPM (R 4.2.0)
## rmdformats 1.0.4 2022-05-17 [1] RSPM (R 4.2.0)
## rstan 2.26.13 2022-09-08 [1] local
## rstantools 2.2.0 2022-04-08 [1] RSPM (R 4.2.0)
## rstudioapi 0.13 2020-11-12 [1] RSPM (R 4.2.0)
## rvest 1.0.2 2021-10-16 [1] RSPM (R 4.2.0)
## sass 0.4.1 2022-03-23 [1] RSPM (R 4.2.0)
## scales * 1.2.0 2022-04-13 [1] RSPM (R 4.2.0)
## sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.2.0)
## shiny 1.7.1 2021-10-02 [1] RSPM (R 4.2.0)
## shinyjs 2.1.0 2021-12-23 [1] RSPM (R 4.2.0)
## shinystan 2.6.0 2022-03-03 [1] RSPM (R 4.2.0)
## shinythemes 1.2.0 2021-01-25 [1] RSPM (R 4.2.0)
## StanHeaders 2.26.13 2022-09-08 [1] local
## stringi 1.7.6 2021-11-29 [1] RSPM (R 4.2.0)
## stringr * 1.4.0 2019-02-10 [1] RSPM (R 4.2.0)
## svUnit 1.0.6 2021-04-19 [1] RSPM (R 4.2.0)
## tensorA 0.36.2 2020-11-19 [1] RSPM (R 4.2.0)
## threejs 0.3.3 2020-01-21 [1] RSPM (R 4.2.0)
## tibble * 3.1.7 2022-05-03 [1] RSPM (R 4.2.0)
## tidybayes * 3.0.2 2022-01-05 [1] RSPM (R 4.2.0)
## tidyr * 1.2.0 2022-02-01 [1] RSPM (R 4.2.0)
## tidyselect 1.1.2 2022-02-21 [1] RSPM (R 4.2.0)
## tidyverse * 1.3.1 2021-04-15 [1] RSPM (R 4.2.0)
## tzdb 0.3.0 2022-03-28 [1] RSPM (R 4.2.0)
## utf8 1.2.2 2021-07-24 [1] RSPM (R 4.2.0)
## V8 4.2.0 2022-05-14 [1] RSPM (R 4.2.0)
## vctrs 0.4.1 2022-04-13 [1] RSPM (R 4.2.0)
## vroom 1.5.7 2021-11-30 [1] RSPM (R 4.2.0)
## withr 2.5.0 2022-03-03 [1] RSPM (R 4.2.0)
## xfun 0.31 2022-05-10 [1] RSPM (R 4.2.0)
## xml2 1.3.3 2021-11-30 [1] RSPM (R 4.2.0)
## xtable 1.8-4 2019-04-21 [1] RSPM (R 4.2.0)
## xts 0.12.1 2020-09-09 [1] RSPM (R 4.2.0)
## yaml 2.3.5 2022-02-21 [1] RSPM (R 4.2.0)
## zoo 1.8-10 2022-04-15 [1] RSPM (R 4.2.0)
##
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
options(width = 80L)